Tone Generation by Maximizing Joint Likelihood of Syllabic HMMs for Mandarin Speech Synthesis

نویسندگان

  • Xingyu Na
  • Chaomin Wang
  • Xiang Xie
  • Jingming Kuang
  • Yaling He
چکیده

A tone generation method by maximizing the joint likelihood of syllabic HMMs is proposed to improve the Mandarin speech synthesis. F0 sequence is generated by jointly maximizing the likelihood of the state-level F0 model and syllable-level tone model under the constraint of mean F0 of the adjacent units. The optimal weight of the tone component is searched in terms of the parameter generation error and correlation coefficients. Objective and subjective evaluations both prove the positive effects of this method. The generation error is reduced by 26.7%, the correlation coefficient is increased by 6.5%, and the prosody perception is significantly improved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosodic Alternative Units in a Mandarin Chinese Speech Synthesizer

The Mandarin Chinese synthesis component of the Dresden Speech Synthesizer DreSS is based on an inventory of syllabic units. The inventory contains all Chinese syllables with the possible tones in up to three phonetic variations for a correct modeling of the cross syllable coarticulation effects. In order to improve the naturalness and fluency of the synthesized speech, the inventory was comple...

متن کامل

Generation of Fundamental Frequency Contours of Mandarin in HMM-based Speech Synthesis using Generation Process Model

The HMM-based speech synthesis system can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. In this approach, short term spectra, fundamental frequency (F0) and duration are generated by multi-stream HMMs separately. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tr...

متن کامل

Quantitative analysis of F0 contours of emotional speech of Mandarin

For emotional speech synthesis, a quantitative model giving a parametric representation of F0 contours is needed. Purpose: investigate quantitatively F0 characteristics of Mandarin speech in four basic emotions (anger, fear, joy, and sadness) and in neutral reading. Two approaches are compared: surface features analysis from time-normalized F0 contours analysis-by-synthesis of time-intact F0 co...

متن کامل

An HMM-Based Mandarin Chinese Text-To-Speech System

In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system. Mandarin Chinese or Putonghua, “the common spoken language”, is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns. Their segmental and supra-segmental information is first modeled by 3 corresponding HMMs, including: (1) spectral env...

متن کامل

On the Duration of Mandarin Tones

The present study compared the duration of Mandarin tones in three types of speech contexts: isolated monosyllables, formal text-reading passages, and casual conversations. A total of 156 adult speakers was recruited. The speech materials included 44 monosyllables recorded from each of 121 participants, 18 passages read by 2 participants, and 20 conversations conducted by 33 participants. The d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012